An Anomaly Detection Mechanism for IEC 60870-5-104

The transformation of the conventional electricity grid into a new paradigm called smart grid demands the appropriate cybersecurity solutions. In this paper, we focus on the security of the IEC 60870-5-104 (IEC-104) protocol which is commonly used by Supervisory Control and Data Acquisition (SCADA) systems in the energy domain. In particular, after investigating its security issues, we provide a multivariate Intrusion Detection System (IDS) which adopts both access control and outlier detection mechanisms in order to detect timely possible anomalies against IEC-104. The efficiency of the proposed IDS is reflected by the Accuracy and F1 metrics that reach 98% and 87%, respectively.


I. INTRODUCTION
The Critical Infrastructures (CIs) and especially the electrical grid constitute a frequent target of the Advanced Persistent Threats (APTs). In particular, they are composed of legacy technologies characterised by severe security flaws. Moreover, although the rapid advance of the Internet of Things (IoT) introduces new beneficial characteristics to CIs, it increases in parallel the attack surface due to the insecure nature of the Internet and specifically of the respective communication protocols [1].
In this paper, we focus on the Transmission Control Protocol (TCP)-based IEC 60870-5-104 (IEC-104) protocol, which is commonly utilised by Supervisory Control and Data Acquisition (SCADA) systems in Europe. IEC-104 uses the 2404 TCP port and does not include sufficient authorisation mechanisms, thus allowing potential cybercriminals to violate the IEC-104 communications either via unauthorised IEC-104 commands or Man in The Middle (MiTM) attacks [2]. Based on the aforementioned security gaps of IEC-104, in this paper, we provide a relevant Intrusion Detection System (IDS) which relies on essential access control rules and machine learningbased outlier detection mechanisms.
In particular, the rest of this paper is organised as follows. Section II discusses previous works related to the security of IEC-104. In section III, we provide a background about the IEC-104 security and the various machine learning anomaly detection methods. Section IV is devoted to the architecture of the proposed IDS, while Section V evaluates its efficacy. Finally, Section VI concludes this paper.

II. RELATED WORK
Many authors have investigated the security issues of IEC-104. In particular, in [2], the authors provided a risk assessment model regarding the IEC-104 communications, taking into account a Coloured Petri Net (CPN)-based threat assessment model as well as the risk assessment model of AlienVault OSSIM [3]. In [4], P. Maynard et al. focused on the possible MiTM and replay attacks against IEC-104, covering also the corresponding injection commands. Accordingly, in [5] C.Lin and S. Nadjm-Tehrani analysed IEC-104 traffic patterns, aiming at discovering underlying timing patterns of spontaneous events. In [6]

A. IEC 60870-5-104 Security Issues
The functionality of IEC-104 relies on the TCP/IP, which exhibits a number of cybersecurity issues. Although IEC 62351 [9] provides sufficient guidelines that can enhance the security of IEC-104, the industrial nature of SCADA hinders their immediate upgrade. A severe security issue of IEC-104 is the transmission of data without any encryption mechanism, thus making it possible to execute traffic analysis and MiTM attacks. In addition, many IEC-104 commands, such as reset commands, interrogation commands, read commands do not integrate authentication and authorisation procedures, thereby allowing the unauthorised access. This vulnerability is crucial since a cyberattacker is capable of controlling the field devices and possibly, the overall operation of the infrastructure.

B. Machine Learning Algorithms Background
In this section, a short overview of the anomaly detection methods based on machine learning solutions is provided. A more comprehensive literature review can be found in recent surveys [10], [11]. The machine learning methods for anomaly detection can be separated to model, clustering, reconstruction and proximity-based. Model-based approaches include the Gaussian mixture models (GMM) [12] that fit the whole dataset to a mixed Gaussian distribution. The GMM parameters usually are estimated with Expectation-Maximization solutions or deep estimation networks.
The attribute-based approaches for anomaly detection assume that the features of normal examples can be predicted by the rest or in the case of the Isolation Forest algorithm, it finds anomalies by deliberately overfitting models that memorise each data point. Particularly, in this case, outliers have more empty space around them, and therefore they take fewer steps to memorise. Many anomaly detection methods are considered clustering-based detectors, assuming that the normal data are located close to their closest cluster. The methods Principal Component Analysis (PCA), Matrix Factorization (MF), Stochastic Outlier Selection (SOS) and deep Auto-encoders belong to the reconstruction-based approaches. The concept behind these methods is to learn a mapping from a higher to a lower-dimensional space through the compression and decompression stages and identify points with high reconstruction error as anomalies. Regarding SOS, it is an unsupervised anomaly-selection algorithm that takes as input either a feature matrix or a dissimilarity matrix and outputs for each data point an anomaly probability. Intuitively, a data point is considered to be an anomaly when the other data points have an insufficient affinity with it. One-Class Support Vector Machine (OC-SVM) aims to find a hyperplane that can separate the vast majority of data from the origin in the projected high dimensional space without making any assumptions about their distribution. In particular, OC-SVM separates all the data points from the origin (in feature space) and maximises the distance from this hyperplane to the origin. This results in a binary function, which captures regions in the input space where the probability density of the data lives. The idea of OC-SVM for anomaly detection is to find a function that is positive for regions with a high density of points, and negative for small densities.
Proximity-based methods do not require any training or assumptions about the dataset. They consider the rarity of a point, measuring, for example, the distance to K-Nearest Neighbour (KNN) or the ratio of local reachability density.

A. Network Traffic Monitoring Module
The Network Traffic Monitoring Module relies on the Scapy library [13] and is responsible for monitoring and capturing the overall network traffic based on a predefined frequency which can be defined by the user.

B. Network Packet Access Control Module
This module receives the captured network traffic from the previous module and utilises Scapy [13] in order to apply some initial security controls. In particular, it adopts a whitelist in which all legitimate, Medium Access Control (MAC) and Internet Protocol (IP) addresses are stored. Therefore, if a packet contains a MAC or an IP address which is not included in the whitelist, then a security event is generated and stored in the Elasticsearch database of Server. The legitimate MAC and IP addresses should be defined by the system operator or the security administrator. In addition, this whitelist defines also the permitted TCP and UDP ports. Therefore, if a packet includes a non-legitimate port, the corresponding security event is generated.

C. IEC-14 Flows Extraction Module
This module receives the captured network packets and exports the corresponding bi-directional IEC-104 flows, utilising the CICFlowMeter software [14]. In particular, CICFlowMeter generates for each flow 83 features that are stored in a different index of the Elasticsearch database. Also, it is noteworthy that different flow-timeout thresholds can be used for extracting the corresponding IEC-104 flows, thus affecting proportionally the 83 features [14].

D. Anomaly Detection Module
The Anomaly Detection Module constitutes the core of the proposed IDS. First, it receives the captured IEC-104 flows from the Elasticsearch database and applies outlier detection models in order to detect which of them are anomalies. The efficacy of these models is discussed in Section V. Finally, it stores the corresponding security events (i.e., anomalous IEC-104 flows) in a different index of the Elasticsearch database.

E. Response Module
The Response Module undertakes to inform the user about the various security events via Kibana of the Elastic Stack. Moreover, it provides statistic charts that assist the user in understanding better the security status of the infrastructure. Regarding the security events, the format of AlienVaut OSSIM [3], [15] was utilised. In particular, the security events detected by the proposed IDS are related to the controls of Network Packet Access Control and Anomaly Detection Modules.

V. EVALUATION ANALYSIS
This section is devoted to the efficacy of the outlier detection models of the Anomaly Detection Module. In particular, three outlier detection algorithms were evaluated, namely a) OC-SVM, b) Isolation forest and c) LOF to detect anomalous IEC-104 flows under four different flow-timeout thresholds: 15, 30, 60 and 120 seconds. In order to train the corresponding models, we combined normal IEC-104 data stemming from a real substation as well as IEC-104 malicious data of [16]. Moreover, utilising the PCA method, we chose only seven features from the 83 ones generated by CICFlowMeter, including a) the total packets in the forward direction, b) the total size of the packets in the backward direction, c) the standard deviation size of the packets in the forward direction, d) the number of the flow bytes per second, e) the maximum time between two packets sent in the flow, f) the minimum length of a packet, g) the average number of bytes in a sub-flow in the backward direction and h) the maximum time where a flow was active before becoming idle. It is worth mentioning that the previous features are related only to the IEC-104 packets since Scapy [13] and CICFlowMeter were configured to capture and extract only IEC-104 flows, respectively. Tables I-IV

VI. CONCLUSIONS
The continuous progression and involvement of IoT in the industrial domain and especially in the electrical grid requires the presence of appropriate cybersecurity measures. In this paper, we focused our attention on the security of the IEC-104 protocol, which is commonly utilised by SCADA systems. In particular, after investigating IEC-104 security issues, we provided a relevant IDS, which applies access control and outlier detection mechanisms in order to detect IEC-104 anomalies. The performance of the proposed IDS is demonstrated through the evaluation analysis, where Accuracy and F1 score reach 98% and 87%, respectively.

VII. ACKNOWLEDGEMENT
This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No. 787011 (SPEAR).